home *** CD-ROM | disk | FTP | other *** search
- Path: ivan.iecc.com!not-for-mail
- From: mw@ipx2.rz.uni-mannheim.de (Marc Wachowitz)
- Newsgroups: comp.compilers,comp.lang.c++
- Subject: Re: Parsing C++ headers?
- Date: 16 Apr 1996 22:25:33 -0400
- Organization: Compilers Central
- Sender: johnl@iecc.com
- Approved: compilers@ivan.iecc.com
- Message-ID: <96-04-096@comp.compilers>
- References: <96-04-033@comp.compilers>
- NNTP-Posting-Host: localhost.iecc.com
- Keywords: C++, yacc, parse
-
- John Mitchell (104316.1514@CompuServe.COM) wrote:
- > I want to parse C++ header files, and extract information about the
- > contained classes and their interface. A yacc-able grammar for this
- > type of problem ( or superset of it, as for compilation) would be a
- > great help.
-
- You might also look at ftp://ftp.csl.sri.com/pub/dodd/btyacc.tar.gz -
- the README file is appended below. Using a relatively "understandable"
- (as far as one can say so for C++ ;-) grammar - such as from the ARM,
- or the latest working draft - and paying a limited performance penalty
- may be preferrable to a possibly somewhat faster "patchwork"-parser
- recognizing constructs as early as possible.
-
- -- Marc Wachowitz <mw@ipx2.rz.uni-mannheim.de>
-
-
-
- BTYACC -- backtracking yacc.
-
- BTYACC was created by Chris Dodd using ideas from many places and lots
- of code from the Berkeley Yacc distribution, which is a public domain
- yacc clone put together by the good folks at Berkeley. This code is
- distributed with NO WARRANTEE and is public domain. It is certain to
- contain bugs, which you should report to:
- dodd@csl.sri.com
-
- See the README.BYACC, NEW_FEATURES, and ACKNOWLEDGEMENTS files for more
- about Berkeley Yacc and other sources of info.
-
- BTYACC is a modified version of yacc that supports automatic
- backtracking and semantic disambiguation to parse ambiguous grammars,
- as well as syntactic sugar for inherited attributes (which tend to
- introduce conflicts). Whenever a btyacc generated parser runs into a
- shift-reduce or reduce-reduce error in the parse table, it remembers
- the current parse point (yacc stack and input stream state), and goes
- into trial parse mode. It then continues parsing, ignoring most rule
- actions. If it runs into an error (either through the parse table or
- through an action calling YYERROR), it backtracks to the most recent
- conflict point and tries a different alternative. If it finds a
- successful parse (reaches the end of the input or an action calls
- YYVALID), it backtracks to the point where it first entered trial
- parse mode, and continues with a full parse (executing all actions),
- following the path of the successful trial.
-
- Actions in btyacc come in two flavors -- {}-actions, which are only
- executed when not in trial mode, and []-actions which are executed
- regardless of mode. There are also inherited attributes, which look
- like arguments (they're enclosed in `()') and act like []-actions.
-
- What this buys you:
-
- * No more lexer feedback hack. In yacc grammars for C, a standard
- hack, know as the `lexer feedback hack' is used to find typedef names.
- The lexer uses semantic information to decide if any given identifier
- is a typedef-name or not and returns a special token. With btyacc,
- you no longer need to do this; the lexer should just always return an
- identifier. The btyacc grammar then needs a rule of the form:
-
- typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
-
- While the hack works adequately well for parsing C, it becomes a
- nightmare when you try to parse something like C++, where treating
- an ID as a typedef becomes heavily dependent on context.
-
- * Easy disambiguation via simple ordering. Btyacc runs its trials via
- the rule ``try shifting first, then try reducing by the order that the
- conflicting rules appear in the input file''. This means you can deal
- with semantic a disambiguation rule like:
- [1] If it looks like a declaration it is, otherwise
- [2] If it looks like an expression it is, otherwise
- [3] it is a syntax error
- [Ellis & Stroustrup, Annotated C++ Reference Manual, p93]
-
- To deal with this, you need only put all the rules for declarations
- before the rules for expressions in the grammar file.
-
- * No extra cost if don't use it. Backtracking is only triggered when
- the parse hits a shift/reduce or reduce/reduce conflict in the table.
- If you have no conflicts in your grammar, there's no extra cost, other
- than some extra code which will never be invoked.
-
- * C++ and ANSI C compatible parsers. The parsers produced by btyacc
- can be compiled with C++ correctly. If you `#define' YYSTYPE to be
- some C++ type with constructor and destructor, everything will work
- fine. My favorite is `#define YYSTYPE SmartPointer', where
- SmartPointer is a smart pointer type that does garbage collection on
- the pointed to objects.
-
- BTYACC was originally written to make it easy to write a C++ parser
- (my goal was to be able to use the grammar out of the back of the ARM
- with as few modifications as possible). Anyone who has ever looked at
- Jim Roskind's public domain C++ yacc grammar, or the yacc-based
- grammar used in g++ knows how difficult this is. BTYACC is very
- useful for parsing any ambiguous grammar, particularly ones that come
- from trying to merge two (or more) complete grammars.
-
- Inherited attributes in btyacc:
-
- Inherited attributes look a lot like function arguments to non-terminals,
- which is what they end up being in a recursive descent parser, but NOT
- how they're implemented in btyacc. Basically they're just syntactic
- sugar for embedded semantic actions and $0, $-1, ... in normal yacc.
- btyacc gives you two big advantages besides just the syntax:
- 1. it does type checking on the inherited attributes, so you don't
- have to specify $<type>0 and makes sure you give the correct`
- number of arguments (inherited attributes) to every use of a
- non-terminal.
- 2. It `collapses' identical actions from that are produced from
- inherited attributes. This eliminates many potential reduce-reduce
- conflicts arrising from the inherited attributes.
-
- You use inherited attributes by declaring the types of the attributes
- in the preamble with a type declaration and declaring names of the
- attributes on the lhs of the yacc rule. You can of course have more
- than one rule with the same rhs, and you can even give them different
- names in each, but the type and number must be the same.
-
- Here's a small example:
- %type <t1> lhs(<t1>, <t2>) /* lhs takes two inherited attributes */
- stuff(<t1>, <t2>)
- %%
- lhs($i1, $i2) : { $$ = $i1 }
- | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
-
- This is roughly equivalent to the following yacc code:
- lhs : { $$ = $<t1>-1; }
- | lhs [ $<t1>$ = $1; ] [ $<t2>$ = $<t2>0; ] stuff { $$ = $4; }
- ;
-
- See the file `test/t2.y' for a longer and more complete example.
- At the current time, the start symbol cannot have any arguments.
- --
- Send compilers articles to compilers@iecc.com,
- meta-mail to compilers-request@iecc.com.
-